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To learn about the past from a sample of genomic sequences, one needs to understand how 
evolutionary processes shape genetic diversity. Most population genetic inference is based on 
frameworks assuming adaptive evolution is rare. But if positive selection operates on many loci 
simultaneously, as has recently been suggested for many species including animals such as flies, 
a different approach is necessary. In this review, I discuss recent progress in characterizing and 
understanding evolution in rapidly adapting populations where random associations of mutations 
with genetic backgrounds of different fitness, i.e., genetic draft, dominate over genetic drift. As 
a result, neutral genetic diversity depends weakly on population size, but strongly on the rate of 
adaptation or more generally the variance in fitness. Coalescent processes with multiple merg- 
ers, rather than Kingman's coalescent, are appropriate genealogical models for rapidly adapting 
populations with important implications for population genetic inference. 
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I. INTRODUCTION 



Neutral diffusion or coalescent models ( |Kimura 1964 Kingman] 1982) predict that genetic diversity at uncon- 
strained sites is proportional to the (effective) population size N - for a simple reason: Two randomly chosen 
individuals have a common parent with a probability of order 1 /N and the first common ancestor of two individuals 
lived of order generations ago. Forward in time, this neutral coalescence corresponds to genetic drift. However, 



the observed correlation between genetic diversity and population size is rather weak (Leffler et al. 2012 Lewontin 



1974), implying that processes other than genetic drift dominate coalescence in large populations. This notion is 



reinforced by the observation that pesticide resistance in insects can evolve independently on multiple genetic back- 
grounds (|Karasov et al.\ 120101 ILabbe et al.\ 120071) and can involve several adaptive steps in rapid succession (ISchmidt 



et al. 20101. This high mutational input suggests that the short-term effective population size of D. melanogaster is 
greater than 10 9 and conventional genetic drift should be negligible. Possible forces that accelerate coalescence and 
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reduce diversity are purifying and positive selection 
attention (reviewed by 



Historically, the effects of purifying selection have received most 



Charlesworth (2012)) and my focus here will be on the role of positive selection. 



A selective sweep reduces nearby polymorphims through hitc h-hiking. Polymorp hisms linked to the sweeping allele 
are brought to higher frequency, while others are driven out (Maynard Smith and Haigh, 1974). Linked selection 
not only reduces diversity, but also slows down adaptation in other regions of the genome - an effect known as Hill- 
Robertson interference (Hill and Robertson 19661. Hill-Roberston interference has been intensively studied in two 
locus models (Barton 1994) where the effect is quite intuitive: two linked beneficial mutations arising in different 
individuals compete and the probability that both mutations fix increases with the recombination rate between the 
loci. Pervasive selection, however, requires many-locus-models. Here, I will review recent progress in understanding 
how selection at many loci limits adaptation and shapes genetic diversity. Linked selection is most pronounced in 
asexual organisms. The theory of asexual evolution is partly motivated by evolution experiments with microbes, which 
have provided us with detailed information about the spectrum of adaptive molecular changes and their dynamics. 
I will then turn to facultatively sexual organisms which include many important human pathogens such as HIV and 
influenza as well as some plants and nematodes. Finally, I will discuss obligately sexual organisms, where the effect 
of linked selection is dominated by nearby loci on the chromosome. 

The common aspect of all these models is the source of stochastic fluctuations: random associations with back- 
grounds of different fitness. In contrast to genetic drift, such associations persist for many generations, which amplifies 
their effect. In analogy to genetic drift, the fluctuations in allele frequencies through linked selection have been termed 
genetic draft (Gillespie, 2000[ ). The (census) population size determines how readily adaptive mutations and combi- 
nations thereof are discovered but has little influence on coalescent properties and genetic diversity. Instead, selection 
determines genetic diversity and sets the time scale of coalescence. The latter should not be rebranded as N e as this 
suggests that a rescaled neutral model is an accurate description of reality. In fact, many features are qualitatively 
different. Negligible drift does not imply that selection is efficient and only beneficial mutations matter. On the con- 
trary, deleterious mutations can reach high frequency through linkage to favorable backgrounds and the dynamics of 
genotype frequencies in the population remains very stochastic. Genealogies of samples from populations governed by 
draft do not follow the standard binary coalescent process. Instead coalescent processes allowing for multiple mergers 
seem to be appropriate approximations which capture the large and anomalous fluctuations associated with selection. 
Those coalescent models thus form the basis for a population genetics of rapid adaptation and serve as null-models 
to analyze data when Kingman's coalescent is inappropriate. To illustrate clonal interference, draft, and genealogies 



in presence of selection, this review is accompanied by a collection of scripts based on FFPopSim ( Zanini and Neher 



2012) at |webdav.tuebingen.mpg.de/interference[ 



II. ADAPTATION OF LARGE AND DIVERSE ASEXUAL POPULATIONS 



Evolution experiments (reviewed in Burke (2012); Kawecki et al. ( 2012[ )) have demonstrated that adaptive evolution 
is ubiquitous among microbes. Experiments with RNA viruses have shown that the rate of adaptation increases only 
slowly with the population size (Miralles et al. |1999 de Visser et al.\ 19991, suggesting that adaptation is limited 
by competition between different mutations and not by the availability of beneficial mutations. The competition 
between clones, also known as clonal interference, was directly observed in E. coli populations using fluorescent 
markers (Hegreness et al. 2006). Similar observations have been made in Rich Lenski's experiments in which E. coli 
populations were followed for more that 50000 generations ( Barrick et al. 2009 ) . A different experiment selecting > 100 



E. coli populations for heat tolerance has shown that there are 1000s of sites available for adaptive substitutions, that 
there is extensive parallelism among lines in the genes and pathways bearing mutations, and that mutations frequently 
interact e pistatically ( Tcnaillon et al. 2012). By following the frequencies of microsatellite markers in populations 
of E. coli, Perfeito et al.\ ( |2007[ ) estimated the beneficial mutation rate to be Ub ~ 10~ 5 per genome and generation 
with average effects of about 1%. Similarly, it has been shown that beneficial mutations are readily available in yeast 



and compete with each other in the population for fixation ( |Desai et al. 2007 Kao and Sherlock 2008 |Lang et~af. 
2011). At any given instant, the population is thus characterized by a large number of segregating clones giving 



rise to a broad fitness distribution (Desai et al. 



2007). The fate of a novel mutation is mainly determined by the 



genetic background it arises on (Lang et al. 2011 ). Similar rapid adaptation and competition is observed in the global 



populations of influenza, which experience several adaptive substitutions per year (Bhatt et al. 2011 Smith et al. 



2004 Strelkowa and Liissig 2012), mainly driven by immune responses of the host. In summary, evolution of asexual 



microbes does not seem to be limited by finding the necessary single point mutations, but rather by overcoming clonal 
interference and combining multiple mutations. 

These observations have triggered intense theoretical research on clonal interference and adaptation in asexuals. In 
the models studied, rare events, e.g. the fittest individual acquiring additional mutations, dramatically affect the future 
dynamics. Intuition is a poor guide in such situations and careful mathematical treatment is warranted. Nevertheless, 



3 



A) diverse population B) fitness distribution 




FIG. 1 Fitness and mutational effect distributions, (a) A genetically diverse population will typically harbor variation in 
fitness. If many mutations have comparable effects on fitness, the resulting fitness distribution is smooth and roughly normal 
(part b, top). If a small number of large effect mutation exists, the distribution is multi-modal (part b, bottom). Mutational 
effects across the genome are believed to follow a distribution roughly like the one sketched in panel (c). A small fraction 
of mutations are beneficial, the majority are neutral or deleterious, and some are lethal. The integral over U (s) is the total 
mutation rate U. In models of adaptive evolution, the high fitness tail of U(s), shown into in the inset, is the most important 
part. If it falls off faster than exponentially, the fitness distribution tends to be smooth. Otherwise, the distribution is often 
dominated by a few large effect mutations. 



it is often possible to rationalize the results in a simple and intuitive way with hindsight, and I will try to present the 
important aspects in accessible form. 

Our discussion assumes that fitness is a unique function of the genotype. Thereby, we ignore the possibility of 
frequency-dependent selection. A diverse population with many different genotypes can then be summarized by its 
distribution along this fitness-axis; see Fig. [TJA.&B. Fitness distributions are shaped by a balance between injection 
of variation via mutation and the removal of poorly adapted variants. Most mutations have detrimental effects on 
fitness, while only a small minority of mutations is beneficial. The distribution of mutational effects in RNA virus 
has been estimated by mutagenesis (Lalic et al.\ 2011 Sanjuan et al. 2004). Roughly half of random mutations are 



effectively lethal, while 4% were found to be beneficial in this experiment. A distribution of mutational effects, U(s), 
is sketched in Fig. [Tp. General properties of U{s) are largely unknown and will depend on the environment. 

Deleterious mutations rarely reach high frequencies but are numerous, while beneficial mutations are rare but 
amplified by selection. But in order to spread and fix, a beneficial mutation has to arise on an already fit genetic 
background or have a sufficiently large effect on fitness to get ahead of everybody else. Two lines of theoretical works 
have put emphasis cither on the large effect mutations (clonal interference theory) or "coalitions" of multiple mutations 
of similar effect. Both approaches, sketched in Fig.|2]are good approximations depending on the distribution of fitness 
effects. 



A. Clonal Interference 



Consider a homogeneous population in which mutations with effect on fitness between s and s + ds arise with 
rate U(s)ds as sketched in Fig.JIp. In a large population many beneficial mutations arise every generation. In order 
to fix, a beneficial mutation has to outcompete all others; see Fig. [2]A. In other words, a mutation fixes only if no 
mutation with a larger effect arises before it has reached high frequencies in the population. This is the essence 
of clonal interference theory by Gerrish and Lenski (19981. The Gerrish-Lenski theory of clonal interference is an 
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FIG. 2 Adaptation in asexual populations, (a) If the distribution of beneficial mutation has a long tail, the population consists 
of a small number of large clones and only the mutations with the largest effects have a chance of fixing, (b) If many mutations 
of similar effect contribute to fitness diversity, the bulk of the fitness distribution can be described by a smooth function that is 
roughly Gaussian in shape. There exists a fittest genotype in the population with no individuals to its right. Only mutations 
close to this high fitness "nose" have an appreciable chance of fixing. The stochastic dynamics at the nose determines the 
evolution of the entire population and the speed of the entire population, v, has to match the speed of the nose, v nose , in a 
quasi-steady state. The fixation probability (f>(x, s) of a mutation with effect s increases with increasing background fitness as 
sketched in panel (c). A mutant in the bulk of the fitness distribution has essentially zero chance of taking over the population 
since many fitter individuals exist. In the opposite case when the mutant is the fittest in the population, <f>(x, s) is proportional 
to x + s as we would expect in the absence of interference. Since there are very few individuals with very high fitness, most 
mutations that fix come from a narrow region (light grey) where the product of n(x) and c/>(x, s ), sketched in blue, peaks. Note 
that x is Malthusian or log-fitness. Scripts to illustrate interference and fixation can be found in the online supplement 



approximation since it ignores the possibility that two or more mutations with moderate effects combine to outcompete 
a large effect mutation - a process I will discuss below. Its accuracy depends on the functional form of U (s) and the 
population size (Park and Krug, 2007). One central prediction of clonal interference is that the rate of adaptation 
increases only slowly with the population size N and the beneficial mutation rate Ub- This is a consequence of the 
fact that the probability that a particular mutation is successful decreases with NUb since there are more mutations 



competing. This basic prediction has been confirmed in evolution experiments with virus (Miralles et at. 1999 



2000 



de Visser et al. 1999). How the rate of adaptation depends on N and Ub is sensitive to the distribution of fitness 



effects U(s). Generically, one finds that the rate of adaptation is cx (log NUb) a , where a depends on the properties of 
U(s) (|Park et al.\ 



2010). 



Clonal interference theory places all the emphasis on the mutation with the largest effect and ignores variation in 
genetic background or equivalently the possibility that multiple mutations accumulate in one lineage. It is therefore 
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expected to work if the distribution of effect sizes has a long tail allowing for mutations of widely different sizes. It 
fails if most mutations have similar effects on fitness. A careful discussion of the theory of clonal interference and its 



limitations can be found in Park et al. (20101 



B. Genetic background and multiple mutations 

If most beneficial mutations have similar effects, a lineage cannot fix by acquiring a mutation with very large effect 
but has to accumulate more beneficial mutations than the competing lineages. If population sizes and mutation rates 
are large enough that many mutations segregate, the distribution n(x, t) of fitness x in the population i s roughly 
Gaussian, see Fig. [2j3, and the problem becomes tractable (Desai and Fisher 2007 Rouzine et al. 2003 Tsimring 



et al. 1996). More precisely, n(x,t) is governed by the deterministic equation 



—n(x,t) = (x — x)n(x,t) + / U(s)[n(x — s,t) — n(x,t)] ds 
dt 



(1) 



where (x-x)n(x, t) accounts for amplification by selection of individuals fitter than the fitness mean x, and elimination 
of the less fit ones. The second term accounts for mutations that move individuals from x — s to x at rate U(s). 
Integrating this equation over the fitness x yields Fisher's "Fundamental Theorem of Natural Selection" , which states 
that the rate of increase in mean fitness is 



dt 



(2) 



where a 1 is the variance in fitness and A p is the average mutation load a genome accumulates in one generation. A 
steadily moving mean fitness x = vt suggests a traveling wave solution of the form n(x, t) = n(x) where x = x ~ x 
is the fitness relative to the mean. Eq. ([2| is analogous to the breeder's equation that links the response to selection 
to additive variances and co-variances. In quantitative genetics, the trait variances are determined empirically and 
often assumed constant, while we will try to understand how a 2 is determined by a balance between selection and 
mutation. 

To determine the average v, we need an additional relation between v and the mutational input. To this end, it 
is important to realize that the population is thinning out at higher and higher fitness and only very few individuals 
are expected to be present above some \c a s sketched in Fig. [2j3. The dynamics of this high fitness "nose" is very 
stochastic and not accurately described by Eq. ([lj. However, the nose is the most important part where most successful 
mutations arise. There have been two strategies to account for the stochastic effects and derive an additional relation 
for the velocity, (i) The average velocity, v nose , of the nose is determined by a detailed study of the stochastic 
dynamics of the nose. At steady state, this velocity has to equal the average velocity of the mean fitness given by 
Eq. | 2 [), which pr oduces the additional relation required to determine v (Brunet et al. 



2008 Cohen et al. 2005a 



Desai and Fisher 2007 Goyal et al. . |2012 Rouzine et al. 2003| Tsimring et aL |1996 ). (ii) Alternative ly, assuming 



additivity of mutations, v has to equal the average rate at which fitness increases due to fixed mutations ( Good et al. 



2012 Neher et al. 2010) (see (Hallatschek 2011) for a related idea). I will largely focus on this latter approach, 



as it generalizes to sexual populations below. In essence, we need to calculate the probability of fixation $(s, v) of 
mutations with effect size s that arise in random individuals in the population. $ depends on v and implicitly on the 
traveling fitness distribution n(x — vt). Using this notation, we can express v as the sum of effects of mutations that 
fix per unit time: 



< = —x = N I U(s)$(s,v)sds 



(3) 



Note that the mutational input is proportional to the census population size N. To solve Eq. we first have to 
calculate the fixation probability $(s, v), which in turn is a weighted average of the fixation probability, 4>(x-i s )j given 
the mutation appears on a genetic background with relative fitness \- The latter can be approximated by branching 
processes (|Good et al. 2012 |Neher et al. 2010). A detailed derivation of cj)(x, s) is given in the su pplement of Good 
et al. ( 2012[ ), while the subtleties associated with approximations are discussed in Fisher (20131. The qualitative 
features of <fi(x, s) are sketched in Fig. [2]C. 

The product n(x)<t>{Xi s ) describes the distribution of backgrounds on which successful mutations arise. This 
distribution is often narrowly peaked right below the high fitness nose (see Fig. |2p). Mutations on backgrounds with 
lower fitness are doomed, while there are very few individuals with even higher background fitness. The larger s, the 
broader this region is. 

To determine the rate of adaptation, one has to substitute the results for $(s, v) into Eq. ^ and solve for v ( |Desai 
and Fisher 2007 Good et al. 2012 ). A general consequence of the form of the self-consistency condition Eq. ^ is that 



G 



if $ is weakly dependent on v, we will find v proportional to N. In this case the speed of evolution is proportional to 
the mutational input. With increasing fitness variance, a 2 , the genetic background fitness starts to influence fixation 
probabilities, such that eventually v increases only slowly with N. For models in which beneficial mutations of fixed 
effect s arise at rate Ub, the rate of adaptation in large populations is given by 



v oc 



log Ns 



a (logU b /s)2 

(U b s 2 ) § (log ND^ 3 ) 1/3 



s > U b 
s < U b 



(4) 



(Cohen et al. 2005a 



Desai and Fisher 20071. The above has assumed that s is constant, but these expressions hold 



for more general models with a short-tailed distribution U(s) with suitably defined effective U b and s (Good et al. 
2012]). 



Synthesis Clonal interference and multiple mutation models both predict diminishing returns as the population 
increases, but the underlying dynamics are rather different. In the clonal interference picture, population take-overs 
are driven by single mutations and the genetic background on which they occur is largely irrelevant (4>(x> s) depends 
little on x). The mutations that are successful, however, have the very largest effects. In the multiple mutation regime, 
the effect of the mutations is not that crucial, but they have to occur in very fit individuals to be successful (4>(x> s ) 
increases rapidly with \)- 111 both models, the speed of adaptation continues to increase slowly with the population 
size and there is no hard "speed limit" . Distinguishing a speed limit from diminishing returns in experiments is hard 



(Miralles et al. 2000 de Visser et al. 19991 



Whether one or the other picture is more appropriate depends on the distribution of available mutations U (s) . If 
U(s) falls off faster than exponential, adaptation occurs via many small steps (Desai and Fisher 2007[ |Good et al. 



2007 



2012); if the distribution is broader, the clonal interference picture is a reasonable approximation (Park and Krug 



Park et al. 2010). The borderline case of an exponential fitness distribution has been investigated more closely, 



finding that large effect mutations on a pretty good background make the dominant contributions ( Good et al. 
Schiffels et al.\ 120111), i.e., a little bit of both. 



2012 



Empirical observations favor this intermediate situation. Influenza evolution has been analyzed in great detail and 



is was found that a few rather than a single mutation drive the fixation of a particular strain ( Strelkowa and Lassig 



2012). Similarly, evolution experiments suggest that the genetic background is important, but a moderate number of 



large effect mutations account for most of the observed adaptation ( Lang et al. 2011 ). 

Note the somewhat unintuitive dependence of v on parameters in Eq. (|4|. Instead of the mutational input NU b 
and s, v depends on Ns and Ub/ s for U b -C s. In large populations, the dominant time scale of population turnover 
is goverened by selection and is of order g -1 . Ns and Ub/s measure t he strength of reproduction noise (drift) and 
mutations relative to s _1 , respectively (see Neher and Shraiman (2012) for a discussion of this issue in the context of 
deleterious mutations). In large populations, the infinite sites model starts to break down and the same mutations can 
occur independently in several lineages limiting interference ( Bollback and Huelsenbeck 2007 Kim and Orr 2005 1 . 



III. EVOLUTION OF FACULTATIVELY SEXUAL POPULATIONS 



Competition between beneficial mutations in asexuals results in a slow (logarithmic) growth of the speed of adap- 
tation with the population size N (Eq. Q). How does gradually increasing the outcrossing rate alleviate this compe- 
tition? The associated advantages of sex and recombination have been studied extensively ( Charlesworth 1993 Crow 



and Kimura 1965 Fisher 1930 Muller 1932 Rice and Chippindale 2001 



. It is instructive to consider facultatively 
lave many independently segregating loci. 



sexual organisms that outcross at rate r, and in the event of outcrossing 
Facultatively sexual species are common among RNA viruses, yeasts, nematodes, and plants. 

Most of our theoretical understanding of evolution in large facultatively mating populations comes from models 
similar to those introduced above for asexual populations. In addition to mutation, we have to introduce a term 
that describes how an allele can move from one genetic background to another by recombination; see Fig. [3]A_. Given 
the fitness values of the two parents xi ar >d X2 and assuming many independently segregating loci, the offspring 
fitness x is symmetrically distributed around the mid-parent value with half the population variance; see illustration 
in Fig. [3]A_ and ( Bulmer 1980 Turelli and Barton 1994 ) . To understand the process of fixation in such a population, 
the following is a useful intuition: An outcrossing event places a beneficial mutation onto a novel genotype, which is 
amplified by selection into a clone whose size grows rapidly with the fitness of the founder; see Fig. [3)3. These clones 
are transient, since even an initially fit clone falls behind the increasing mean fitness. However, large clones produce 
many recombinant offspring (daughter clones), which greatly enhances the chance of fixation of mutations they carry. 
Since clone size increases rapidly with founder fitness, the fixation probability <fi(x, s) is still a very steep function of 
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A) infinitesimal model B) emergence of clones 




C) rate of adaptation D ) epistasis 




FIG. 3 A facultatively sexual lifecycle is common among many pathogens, plants, and some groups of animals, (a) If many 
loci segregate independently, recombination can be modeled by the infinitesimal model. Given two parents with fitness \\ and 
Xi sampled from the parental distribution with variance a 2 , offspring fitness is symmetrically distributed around the parental 
mean with variance a 2 /2. A mutation, indicated as a red dot in the sketch, can thereby hop from an individual with one 
background fitness to a very different one. (b) If the outcrossing rate is lower than the fitness of some individuals, clones, 
indicated in red, can grow at rate x ~ r - As the population adapts, the growth rate of the clones is reduced, eventually goes 
negative and the clone disappears. The beneficial mutation, however, persists on other backgrounds. In small populations, the 
rate of adaptation increases linearly with the population size as sketched in panel (c). For each outcrossing rate, there is a point 
beyond which interference starts to be important, (d) Epistasis causes condensation of the population into a small number of 
very fit genotypes. Crosses between these genotypes result in unfit individuals. In the absence of forces that stabilize different 
clones, one clone will rapidly take over if x > T. Scripts illustrating evolution of faculatively sexual populations can be found 
in the online supplement 



the background fitness and qualitatively similar to the asexual case (Fig. [2p). With increasing outcrossing rate, the 
fitness window from which successful clones originate becomes broader and broader. 

If outcrossing rates are large enough that genotypes are disassembled by recombination faster than selection can 
amplify the m, <fi(x, s) is essentially flat and the genetic background does not matter much. This transition was 
examined by Neher et al. (2010): 



f 2r 2 log(NU b ) 
) (logr/ S ) 2 

{NU b s 2 



r < VNU b s 2 
r > ^NUb-s 2 . 



(5) 



The essence of this result is that adaptation is limited by recombination whenever r is smaller than the standard 
deviation in fitness in the absence of interference. In this regime, v depends weakly on N, but increases rapidly with 
r. This behavior is sketched in Fig. [3p. Similar results can be found in Weissman and Barton (2012). The above 



analysis assumed that recombination is rare, but still frequent enough to ensure that mutations that rise to high 
frequencies are essentially in linkage equilibrium. This requires r 3> s. Rouzine and Coffin (2005 2010) studied the 
selection on standing variation at intermediate and low recombination rates. Adaptation in presence of horizontal 



gene transfer was investigated by Cohen et al. (2005b), Wylie et al. (2010), and Neher et al. (20101 



In contrast to asexual evolution, epistasis can dramatically affect the evolutionary dynamics in sexual populations. 
Epistasis implies that the effect of mutations depends on the state at other loci in the genome. In the absence of 
sex, the only quantity that matters is the distribution of available mutations, U(s). The precise nature of epistasis 
is not crucial. In sexual populations, however, epistasis can affect the evolutionary dynamics dramatically: When 
different individuals mix their genomes, it matters whether mutations acquired in different lineages are compatible. 



Since selection favors well adapted combinations of alleles, recombination is expected to be on average disruptive and 
recombinant offspring have on average lower fitness than their parents (the so-called "recombination load"). This 
competition between select ion for good genotypes and reco mbination can resul t in a condensation of the population 
into fit clones; see Fig. [3t 



Neher and Shraiman (2009) and Neher et al. (2013). 



IV. SELECTIVE INTERFERENCE IN OBLIGATELY SEXUAL ORGANISMS 



Selective interference has historically received most attention in obligately sexual organisms most relevant to crop 
and animal breeding. Artificial selection has been performed by farmers and breeders for thousands of years with 
remarkable success (Hill and Kirkpatrick 2010). Evolution experiments with diverse species, including chicken, mice 



and Drosophila, have shown that stand i ng variation at a large number of loci responds to diverse s electi on pr essures 
([ Burke et al] |2010| |Chan et al] |2012| | Johansson et al] |2010| |Turner et alj |2011| |Zhou et al] |2011[ ); see [Burke 



(|2012|) for a recent review. In obligately sexual populations, distant loci can respond independently to selection and 

The frequencies of different alleles change according to their effect on 

Small deviations from linkage equilibrium 



remain in approximate linkage equilibrium. 

fitness averaged over all possible fitness backgrounds in the population 
can be accounted for perturbatively using the so-called Quasi-Linkage Equilibrium (QLE) approximation (Barton and 



Turelli 


1991 Kimura 


1965 Neher and Shraiman 


2011a 



A) selective interference B) selective sweep 




FIG. 4 Interference in obligately sexual populations. Panel (a) sketches the interference effects of selective sweeps through 
time (vertical axis) and along the genome (horizontal axis). A sweeping mutation with selection coefficient s interferes with 
other mutation in a region of width s/p over a time s~ , where p is the crossover rate per base. The extent of interference 
is sketched by grey bulges, each of which corresponds to a mutation that fixed. Interference starts to be important when the 
bulges overlap. Since the area of the bulges, roughly "height x width" , is approximately independent of s, interference depends 
on p and the rate of sweeps rather than the effect size. The rate of adaptation is therefore primarily a function of the maplength 
R. (b) A selective sweep reduces neutral genetic variation in a region of width s/(plog(7Vs)). The effect of sweeps on neutral 
diversity is explored in online supplement 



This approximate independence, however, does not hold for loci that are tightly linked. Hill and Robertson ( 1966 ) 



observed that interference between linked competing loci can slow down the response to selection 
termed Hill- Robertson interference (Felsenstein 1974). Felsenstein realized that interference is 



an effect now 
not restricted to 



competing beneficial mutations but that linked deleterious mutations also impede fixation of beneficial mutations 
(see background selection below). The term Hill-Robertson interference is now used for any reduction in the efficacy 
of selection caused by linked fitness variation. A deeper understanding of selective interference was gained in the 
1990ies (Barton 1994 1995b). The key insight of Barton was to calculate the fate of a novel mutation considering 



all possible genetic bac 



cgrounds on which it can arise and summing over all possible trajectories it can take through 
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the population. For a small number of loci, the equations describing the probability of fixation can be integrated 
explicitly. 

Weakly-linked sweeps cause a cumulative reduction of the fixation probability at a focal site that is roughly given 



by the ratio of additive variance in fitness and the squared degree of linkage (Barton 1995b Santiago and Caballero 



1998). Barton (19941 further identified a critical rate of strong selective sweeps that effectively prevents the fixation 



of mutations with an advantage smaller than s c . If sweeps are too frequent, the weakly selected mutation has little 
chance of spreading before its frequency is reduced again by the next strong sweep. 

At short distances, selective sweeps impede each other's fixation more strongly. This interference is limited to a 
time interval of order s _1 generations where one of the sweeping mutations is at intermediate frequencies. During 
this time, a new beneficial mutation will often fall onto the wildtype background and is lost again if it is not rapidly 
recombined onto the competing sweep. The latter is likely only if it is further than s/p nucleotides away from the 
competing sweep, where p is the crossover rate per basepair (Barton 1994). In other words, a sweeping mutation 
with effect s prev ents other sweeps in a region of width s/p, and occupies this chromosomal "real estate" for a time 
s _1 ; see Fig. 4^. (Weissman and Barton 2012 1. Hence strong sweeps briefly interfere with other sweeps in a large 
region, while weak sweeps affect a narrow region for a longer time. The amount of interference is therefore roughly 
independent of the strength of the sweeps, and the total number of sweeps per unit time is limited by the map-length 
R = J p(y) dy, where the integral is over the entire genome and p{y) is the local crossover rate. Larger populations 
can squeeze slightly more sweeps into R (Weissman and Barton [2012 ). In most obligately sexual organisms, sweeps 
rarely cover more than a few percent of the total map length such that recombination is not limiting adaptation unless 
sweeps cluster in certain regions ( Sella et al. 2009 1 . However, as I will discuss below, even rare selective sweeps have 
dramatic effects on neutral diversity. 



V. GENETIC DIVERSITY, DRAFT, AND COALESCENCE 



Interference between selected mutations reduces the fixation probability of beneficial mutations, slows adaptation, 
and weakens purifying selection. These effects are very important, but hard to observe since significant adaptation 
often takes longer than our window of observation. Typically, data consists of a sample of sequences from a population. 
These sequences differ by single nucleotide polymorphisms, insertions, or deletions, and we rarely know the effect of 
these differences on the organism's fitness. 

From a sequence sample of this sort, the genealogy of the population is reconstructed and compared to models of 
evolution - in most cases a neutral model governed by Kingman's coalescent (Kingman, 1982 1. From this comparison 
we hope to learn about evolutionary processes. However, linked selection, be it in asexual organisms, facultatively 
sexuals, or obligately sexuals, has dramatic effects on the genealogies. Substantial effects on neutral diversity are 
observed at rates of sweeps that do not y et cause strong interference b etween selected loci for the simple reason that 
neutral alleles segregate for longer times (Weissman and Barton 2012). 



A. Genetic draft in obligately sexual populations 



2004| |Kaplan et olj |1989| |Maynard Smith and Haigh| |1974| |Stephan et~al 



( 


Barton 


1998 


Barton and Etheridge 


1992 Wiehe and Stephan 


1993 


). A 



sweeping mutation takes about t sw « s~ L log Ns generations to rise to high frequency. Linked neutral variation 
is preserved only when substantial recombination happens during this time. Given a crossover rate p per base, 
recombination will separate the sweep from a locus at distance I with probability r = pi per generation (assuming 
r <C 1). Hence a sweep leaves a dip of width I — (pi sw ) _1 s» s/(p\ogNs) in the neutral diversity (see Fig. [4^5). 
Within this region, selection causes massive and rapid coalescence and only a fra ction of the lineages continue int o 
the ancestral population 



'2005) 



(see Fig. |5]A). This effect has been further investigated by Durrett and Schweinsberg 
who showed that the effect of recurrent selective sweeps is well approximated by a coalescent process that allows for 
multiple mergers: each sweep forces the almost simultan eous coa l escenc e of a large number of lineages (a fraction 
e~ rt " w ). Similar arguments had been made previously by Gillespie (2000), who called the stochastic force responsible 



for coalescence genetic draft. Coop and Ralph (2012) extended the analysis of Durret and Schweinsberg partial sweeps 
that could be common in structured populations, with over-dominance, or frequency dependent selection. 

The rapid coalescence of multiple lineages is unexpected in the standard neutral coalescent (a merger of p lineages 
occurs with probability oc N~ p ). In coalescence induced by a selective sweep, however, multiple mergers are common 
and dramatically change the statistical properties of genealogies. A burst of coalescence corresponds to a portion of 
the tree with almost star-like shape ( Slatkin and Hudson||199lj ). Alleles that arose before the burst are common, those 
after the burst rare. This causes a relative increase of rare alleles, as well as alleles very close to fixation ( Braverman 



A) hard sweep 



B) soft sweep 



rapid merger 




population sample population sample 




FIG. 5 Coalescence driven by selection, (a) A selective sweep (grey region) causes rapid coalescence of lineages at a nearby 
locus. Each sweep causes a fraction of lineages to merge, while the remainder recombines onto an ancestral background, (b) 
Soft sweeps refer to a scenario where single mutations arise multiple times independently in response to environmental change. 
This is expected as soon as the product of N and the per site mutation rate exceeds one and can result in multiple bursts 
of coalescence almost at the same time, (c) A genealogical tree drawn from a simulation of a model of rapidly adapting 
asexual organisms. Coalescence often occurs in bursts. Furthermore, branching is often uneven. At many branchings in this 
"ladderized" tree, most individuals descend from the left branch. Those are well known features of multiple merger coalescence 
processes such as the Bolthausen-Sznitman coalescent. (d) Coalescence and fitness classes. Most population samples consists of 
individual from the center of the fitness distribution, while their distant ancestors were among the fittest. In large populations, 
most coalescence happens in the high fitness nose and the time until ancestral lineages "arrive" in the nose corresponds to 
long terminal branches (compare panel c). How genealogies depend on selection can be studied using simulations, see online 
supplement 



et al. 1995 


Fay and Wu 


2000 


Gillespie 


2000) 



The degree to which linked selective sweeps reduce genetic diversity depends primarily on the rate of sweeps per 
map length (Weissman and Barton 2012). In accord with this expectation, it is found that diversity increases with 
recombination rate and decreases with the density of functional sites (IBegun et al.\ 120071 IShapiro et al.\ 120071). In 



addition to occasional selective sweeps, genetic diversity and the degree of adaptation can be strongly affected by a 
large number of weakly selected sites, e.g. weakly deleterious mutations, that generate a broad fitness distribution 



(McVean and Charlesworth 2000). 



B. Soft sweeps 



Soft sweeps refer to events when a selective sweep originates from multiple genomic backgrounds (Hermisson and 
Penningsl 120051 IPennings and Hermissonl 120061), either because the favored allele arose independently multiple times 
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or because it has been segregating for a long time prior to a environmental change. Soft sweeps have recently been 
observed in pesticide resistance of Drosophila ( Karasov et al. 2010[ ) and are a common phenomenon in viruses with 
high mutation rates. 

A genealogy of individuals sampled after a soft sweep is illustrated in Fig. [5j3 . The majority of the individuals trace 
back to one of two or more ancestral haplotypes on which the selected mutation arose. Hence coalescence is again 
dominated by multiple merger events, except that several of those events happen almost simultaneously. This type 
of coalescent process has been described in Schweinsberg ( 2000 1 . 

Despite dramatic effects on genealogies, soft sweeps can be difficult to detect by standard methods that scan for 
selective sweeps. Those methods use local reductions in genetic diversity, which can be modest if the population traces 
back to several ancestral haplotypes. The number of ancestral haplotypes in a sample after a soft swee p depends 
on the product of N, the per-site mutations rate fj,, and selection against the allele before the sweep (Pennings 



and Hermisson 2006). To detect soft sweeps, methods are required that explicitly search for signatures of rapid 



coalescence into several lineages in linkage disequilibrium or haplotype patterns (Messer and Neher 2012 Pennings 



and Hermisson 2006). 



C. The Bolthausen-Sznitman coalescent and rapidly adapting populations 



Individual selective sweeps have an intuitive effect on genetic diversity, but what do genealogies look like when 
many mutations are competing in asexual or facultatively sexual populations? It has recently been argued that the 
genealogies of populations in many models of rapid adaptation are well described by coalescent processes with multiple 



mergers (Berestycki 2009 Pitman 1999). This was first discovered by Brunet et al. (2007), who studied a model 



where a population expands its range. The genealogies of individuals at the front are described by the Bolthausen- 
Sznitman coalescent, a special case of coalescent processes with multiple mergers. Recently, it has been shown that 



a similar coalescent process emerges in models of adaptation in panmictic populations (Desai et al. 2012 Neher and 



HaUatschek||2013 |. 

Fig. |5p shows a tree sampled from a model of a rapidly adapting population. A typical sample from a rapidly 
adapting population will consist of individuals from the center of the fitness distribution. Their ancestors tend to 
be among the fittest in the population (Hermisson et al. 2002 Rouzine and Coffin 20071. Substantial coalescence 



happens only once the ancestral lineages have reached the high fitness tip, resulting in long terminal branches of the 
trees. Once in the tip, coalescence is driven by the competition of lineages against each other and happens in bursts 
whenever one lineage gets ahead of everybody else. These bursts correspond to the event that a large fraction of the 
population descends from one particular individual. These coalescent events have approximately the same statistics 



as neutral coalescent processes with very broad but non- heritable offspring distributions (Der et al. 2011 Eldon and 



Wakeley 2006^ ISchweinsberg 2003 1 



In the case of rapidly adapting asexual populations, the effective distribution of the number n of offspring is given by 
P(n) ~ n~ 2 which gives rise to the Bolthausen-Sznitman coalescent. This type of distribution seems to be universal 
to populations in which individual lineages are amplified while they diversify and is found in facultatively sexual 
populations (Neher and Shraiman 2011b[ ), asexual populations adapting by small steps, as well as populations in a 
dynamic balance between deleterious and beneficial mutations. Asymptotic features of the site frequency spectrum 
can be derived analytically (Berestycki 2009| Desai et al. 2012 Neher and Hallatschek 2013). One finds that the 
frequency spectrum diverges as f(y) ~ v~ z at low frequencies corresponding to many singletons. Furthermore, neutral 
alleles close to fixation are common with j(v) diverging again as v — > 1. This relative excess of rare and very common 
alleles is a consequence multiple mergers which produce star-like sub-trees and the very asymmetric branching at 
nodes deep in the tree (compare Fig. |5p). 

The time scale of coalescence, and with it the level of genetic diversity, is mostly determined by the strength of 
selection and only weakly increases with population size. Essentially, the average time to a common ancestor of two 
randomly chosen individuals is given by the time it takes until the fittest individuals dominate the population. In 
most models, this time depends only logarithmically on the population size N. 



D. Background selection and genetic diversity 



Background selection refers to the effect of purifying selection on linked loci, which is particularly important if 
linked regions are long. If deleterious mutations incur a fitness decrement of s and arise with genome wide rate 
Ud, a sufficiently large population settles in a state where the number of mutations in individuals follows a Poisson 
distribution with mean A = Ud/s (Haigh 1978). Individuals loaded with many mutations are selected against, but 
continually produced by de novo mutations. All individuals in the population ultimately descent from individuals 
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carrying least deleterious mutations. Within this model, the least loaded class has size 7Vexp(— Ud/s) and coalescence 
in this class is accelerated by exp(Ud/s) compared to a neutrally evolving population of size N ( Charlesworth et al. 



1993). For large ratios Ud/s, the Poisson distribution of background fitness spans a large number of fitness classes 



and this heterogeneity substantially reduces the efficacy of selection (McVean and Charlesworth, 20001. 



The effect of background selection is best appreciated in a genealogical picture. Genetic backgrounds sampled from 
the population tend to come from the center of the distribution. Since the deleterious mutations they carry were 
accumulated in the recent past, lineages "shed" mutations as we trace them back in time until they arrive in the 
mutation free class akin to Fig. [5p. This resulting genealogical process, a fitness class coalescent, has been described 
~ ([20T2l. 



Walczak et al. 



A recent study on the genetic diversity of whale lice (Seger et al. 



O 'Fallon et al 



2010[) sugge sts that 
(I2010P present 



purifying selection and frequent deleterious mutations can severely distort genealogies, 
methods for the analysis of sequence samples under purifying selection. 

The fitness class coalescent is appropriate as long as Muller's ratchet does not yet click. More generally, fixation of 
deleterious mutations, adaptation, and environmental change will balance approximately. It has been shown that a 



small fraction of beneficial mutations can be sufficient to halt Muller's ratchet (Goyal et al. 2012). In this dynamic 



balance between frequent deleterious and rare beneficial mutations, the genealogies tend to be similar to genealogies 
under rapid adaptation discussed above. 



VI. CONCLUSIONS AND FUTURE DIRECTIONS 



Contradicting neutral theory, genetic diversity correlates only weakly with population size (Leffler et al. 2012), 



suggesting that linked selection or genetic draft are more important than conventional genetic drift. Draft is most 
severe in asexual populations, for which models predict that the fitness differences rather than the population size 
determine the level of neutral diversity. As outcrossing becomes more frequent, the strength of draft decreases and 
diversity increases. With increasing coalescence times, selection becomes more efficient as there is more time to 
differentiate deleterious from beneficial alleles. In obligately sexual populations, most interference is restricted to 
tightly linked loci and the number of sweeps per map length and generation determines genetic diversity. 

Since interference slows adaptation, one expects that adaptation can select for higher recombination rates 



(Charlesworth 1993). Indeed, positive selection results in indirect selection on recombination modifiers (Barton 



1995a| |Barton and Otto] |2005| |Hartfield et al. 



2010 Otto and Barton 



1997 ). Changing frequencies of outcrossing 
have been observed in evolution experiments (Becks and Agrawal 20101. However, the evolution of recombination 
and outcrossing rates in rapidly adapting populations remains poorly understood, both theoretically and empirically. 

The traveling wave models discussed above assume a large number of polymorphisms with similar effects on fitness 
and a smooth fitness distribution, which are drastic idealizations. More typically, one finds a handful of polymorphisms 
with a distribution of effects (Barrick et al. |2009 Lang et al. 2011 Strelkowa and Lassig 2012 1. Simulations 
indicate, however, that statist ical properties of genealogies a re rather robust regarding model assumptions as long 

Appropriate genealogical models are prerequisite for 



2013) 



as draft dominates over drift (Neher and Hallatschek 
demographic inference. If, for example, a neutral coalescent model is used to infer the population size history of a 
rapidly adapting population, one would conclude that the population has been expanding. Incidentally, this is inferred 
in most cases. Some progress towards incorporating the effect of purifying selection into estimates from reconstructed 

" 20lT| ) 



genealogies has been made recently (Nicolaisen and Desai| 2012 O'Fallon 



Alternative genealogical models 



accounting for selection should be included into popular analysis programs such as BEAST ( Drummond and Rambaut 
2007] . 

It is still common to assign an "effective" size, N e , to various populations. In most cases, N e is a proxy for genetic 
diversity, which depends on the time to the most recent common ancestor. With the realization that coalescence 
times depend on linked selection and genetic draft, rather than the population size and genetic drift, the term should 
be avoided and replaced by T c , the time scale of coalescence. Defining N e suggests that the neutral model is valid 
as long as N e is used instead of N. We have seen multiple times that drift and draft are of rather different natures 
and that this difference cannot be captured by a simple rescaling. Each quantity then requires its own private N e , 
rendering the concept essentially useless. Some quantities like site frequency spectra are qualitatively different and 
no N e maps them to a neutral model. The (census) population size is nevertheless important in discovering beneficial 
mutations. For this reason, large populations are expect to respond more quickly to environmental change as we are 
painfully aware in the case of antibiotic resistance of pathogens. Large populations might therefore track phenotypic 
optima more closely resulting in beneficial mutations with smaller effect, which in turn might explain their greater 
diversity. 

The majority of models discussed assume a time invariant fitness landscape. This assumption reflects our ignorance 
regarding the degree and timescale of environmental fluctuations (for work on selection in time-dependent fitness 
landscapes, see Mustonen and Lassig (2009)). Time-variable selection pressures, combined with spatial variation, 
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could potentially have strong effects. Similarly, frequency-dependent selection and more generally the interaction of 
evolution with ecology are important avenues for future work. The challenge consists of choosing useful models that 
are tractable, appropriate, and predictive. 
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Appendix A: Glossary 

• Genetic drift: stochastic changes in allele frequencies due to non-heritable variation in offspring number. 

• Purifying selection: selection against deleterious mutations. 

• Positive selection : selection for novel beneficial mutations. 

• Genetic draft: changes in allele frequencies due to (partly) heritable random associations with genetic 
backgrounds. 

• Hitchhiking: rapid rise in frequency through an association with a very fit background. 

• Selective interference: reduction of fixation probability through competition with other beneficial alleles. 

• Clonal interference: competition between well adapted asexual subpopulations from which only one sub- 
population emerges as winner. 

• Branching PROCESS: stochastic model of reproducing and dying individuals without a constraint on the overall 
population size. 

• EpistasiS: background dependence of the effect of mutations. Epistasis can result in rugged fitness landscapes. 

• Kingman coalescent: basic coalescence process where random pairs of individuals merge. 

• Multiple merger COALESCENT: coalescent process with simultaneous merging of more than 2 lineages. 

• Bolthausen-Sznitman Coalescent (BSC): special multiple merger coalescent which approximates genealo- 
gies in many models of adaptation. 



